Machine Learning as a Source of Insight into Universal Grammar

نویسندگان

  • SHALOM LAPPIN
  • STUART M. SHIEBER
  • MICHAEL COLLINS
چکیده

It is widely believed that the scientific enterprise of theoretical linguistics and the engineering of language applications are separate endeavors with little for their techniques and results to contribute to each other at the moment. In this paper, we explore the possibility that machine learning approaches to natural-language processing (NLP) being developed in engineering-oriented computational linguistics (CL) may be able to provide specific scientific insights into the nature of human language. We argue that, in principle, machine learning (ML) results could inform basic debates about language, in one area at least, and that in practice, existing results may offer initial tentative support for this prospect. A basic conundrum of modern linguistic research is the question of how natural languages can be acquired. It is uncontroversial that the learning of a natural language (or of anything else) requires some assumptions concerning the structure of the phenomena being acquired. So, for example, consider a computer program that is designed to learn to classify emails as spam or non-spam. One could assume that spam is identified by the magnitude of a weighted average over a set of indicator features fi, each of which specifies whether a given word wi appears in the email. We could then use a corpus of email messages, each marked as to their spam status, to train the weights appropriately. Such a learning approach (essentially perceptron learning over the features) assumes (erroneously, of course, but often effectively) that the phenomenon being investigated, spam status, is characterizable independently of any properties of the email but those manifest in the word-existence features, and that their impact on spam status is strictly linear. Alternatively, some other set of features, method for computing their combination, and so forth might be used, reflecting other assumptions about the nature of the phenomenon being investigated. In the field of machine learning, the prior structure imputed to the phenomenon is referred to as the model, which allows for variation in a set of parameters that may be of different sorts—continuous or discrete, fixed or combinatorial. Learning algorithms are procedures for setting these parameters on the basis of samples (observations) of the phenomenon being acquired. The success of the algorithm relative to the observations can be verified by testing to see if it generalizes correctly to unseen instances of the phenomenon. The role of the model and algorithm is to provide a learning bias.1 In linguistics, this prior structure is sometimes referred to as universal grammar (UG), the innate (that is experientially prior) aspect of the human language endowment that allows (biases) language acquisition. In the sequel, we will uniformly use the term learning bias for this model or universal grammar aspect.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Machine learning theory and practice as a source of insight into universal grammar 1 SHALOM

In this paper, we explore the possibility that machine learning approaches to naturallanguage processing (NLP) being developed in engineering-oriented computational linguistics (CL) may be able to provide specific scientific insights into the nature of human language. We argue that, in principle, machine learning (ML) results could inform basic debates about language, in one area at least, and ...

متن کامل

Machine learning theory and practice as a source of insightinto universal grammar

In this paper, we explore the possibility that machine learning approaches to naturallanguage processing being developed in engineering-oriented computational linguistics may be able to provide specific scientific insights into the nature of human language. We argue that, in principle, machine learning results could inform basic debates about language, in one area at least, and that in practice...

متن کامل

Image alignment via kernelized feature learning

Machine learning is an application of artificial intelligence that is able to automatically learn and improve from experience without being explicitly programmed. The primary assumption for most of the machine learning algorithms is that the training set (source domain) and the test set (target domain) follow from the same probability distribution. However, in most of the real-world application...

متن کامل

Machine learning theory and practice as a source of insight into universal grammar1

In this paper, we explore the possibility that machine learning approaches to naturallanguage processing (NLP) being developed in engineering-oriented computational linguistics (CL) may be able to provide specific scientific insights into the nature of human language. We argue that, in principle, machine learning (ML) results could inform basic debates about language, in one area at least, and ...

متن کامل

The Interaction of Gender with Text Enhancement and Meta-cognitive Grammar Instruction on Learning and Recall of English Grammar

The current research was an effort to study the interaction of gender with text enhancement and meta-cognitive grammar instruction on learning and recall of English grammar. To this end, two groups of students consisting of 51 learners from both genders were formed. The participants were 51 male and 51 female learners. The 51 participants of each gender were further divided into two groups. The...

متن کامل

Image Classification via Sparse Representation and Subspace Alignment

Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006